?
In this worksheet, the Pre-processing of 25 publicly available Bone Marrow samples is described. This includes data acquisition, Quality Control and Cell selection, Normalization and Feature Selection of each individual sample. Afterwards all samples are integrated with the Seurat Anchor Function.
After reducing the complexity with PCA, the integrated data is visualized and evaluated.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(Seurat)
library(ggplot2)
packageVersion("dplyr")
[1] '0.8.99.9003'
packageVersion("Seurat")
[1] '3.1.5'
The read count matrices, genes and barcodes of 25 scRNA Bone Marrow samples were collectivly downloaded from the National Center for Biotechnology Information’s Gene Expression Omnibus (GSE120221), provided by Oetjen et al. 2018.
Each sample was loaded into one Seurat object, stored in a named list (Sample ID and Letter were provided by the directory name of each sample)
data_dir1 <- list.files("~/ExampleSets/BoneMarrowNormal/Data", full.names = TRUE)
names(data_dir1) <- list.files("~/ExampleSets/BoneMarrowNormal/Data")
bmmc_Raw.data <- Read10X(data.dir = data_dir1)
bmmc_Raw <- CreateSeuratObject(counts = bmmc_Raw.data, project = "BoneMarow", min.cells = 3, min.features = 200)
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Information about the percentage of mitochondrial RNA per cell are extracted and plotted for each sample.
#Mitochondrial genes selected
bmmc_Raw[["percent.mt"]] <- PercentageFeatureSet(bmmc_Raw, pattern = "^MT-")
# Visualize as a violin plot
VlnPlot(bmmc_Raw, features = "percent.mt", ncol = 1, y.max = 15) + NoLegend()
Warning: Removed 124 rows containing non-finite values (stat_ydensity).
Warning: Removed 124 rows containing missing values (geom_point).
The aggregated correlation of amount of RNA and mitochondrial RNA as well as number of Feature is plotted as a further QC
plot1 <- FeatureScatter(bmmc_Raw, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(bmmc_Raw, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
CombinePlots(plots = list(plot1, plot2))
Warning: CombinePlots is being deprecated. Plots should now be combined using
the patchwork system.
Cells with a low amount of features or a high amount of mitochondrial RNA are removed. After evaluation the tresholds of the original publication were applied (MT > 8%, less then 500 RNA-Features).
bmmc_Raw <- subset(bmmc_Raw, subset = nFeature_RNA > 500 & percent.mt < 8)
plot1 <- FeatureScatter(bmmc_Raw, feature1 = "nCount_RNA", feature2 = "percent.mt") + NoLegend()
plot2 <- FeatureScatter(bmmc_Raw, feature1 = "nCount_RNA", feature2 = "nFeature_RNA") + NoLegend()
CombinePlots(plots = list(plot1, plot2))
Warning: CombinePlots is being deprecated. Plots should now be combined using
the patchwork system.
length(bmmc_Raw@active.ident)
[1] 76645
This number of filtered cells is also reported in the original publication.
Each sample was individually normalized and the top 2000 feautures were selected. Each normalized sample was stored in a list.
bmmc_Raw.list <- SplitObject(bmmc_Raw, split.by = "orig.ident")
bmmc_Raw.list <- lapply(X = bmmc_Raw.list, FUN = function(x) {
x <- NormalizeData(x, verbose = FALSE)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000, verbose = FALSE)
})
###2.4 Integration
The 25 samples were combined with the Seurat “FindIntegrationAnchors” and “IntegrateData” method (See Stuart&Butler et al. 2019), using the first 30 Principal components. Remarkly this computational method was not available during the original BMMC study.
bmmc.anchors <- FindIntegrationAnchors(object.list = bmmc_Raw.list, dims = 1:30)
bmmc.combined <- IntegrateData(anchorset = bmmc.anchors, dims = 1:30)
Warning: Adding a command log without an assay associated with it
##3 Dimension reduction
###3.1 Scaling
The integrated dataset is scaled (zero-centered and Scaled)
DefaultAssay(object = bmmc.combined) <- "integrated"
bmmc.combined <- ScaleData(object = bmmc.combined, verbose = FALSE)
###3.2 PCA The pincipal components of the integrated data set are calculated and the top genes contributing for the first two PCs are presented.
bmmc.combined <- RunPCA(bmmc.combined, verbose = FALSE)
VizDimLoadings(bmmc.combined, dims = 1:2, reduction = "pca")
The PCA dimension reduction is shown for the first two PCs:
DimPlot(bmmc.combined, reduction = "pca")
The PC components which explained most of the variance are selected based on the ElbowPlot shown below. In this case PCs after PC 20, each PC explains less than 2% of the variance.
ElbowPlot(bmmc.combined, ndims = 50) + geom_hline(yintercept=2)
###4.1 UMAP
UMAP is calculated based on the first 20 PC dimensions
bmmc.combined <- RunUMAP(bmmc.combined,reduction = "pca", dims = 1:20)
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
16:03:42 UMAP embedding parameters a = 0.9922 b = 1.112
16:03:42 Read 76645 rows and found 20 numeric columns
16:03:42 Using Annoy for neighbor search, n_neighbors = 30
16:03:42 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:04:00 Writing NN index file to temp file /tmp/RtmpBQd4YX/filef567352fdfa7
16:04:00 Searching Annoy index using 1 thread, search_k = 3000
16:04:45 Annoy recall = 100%
16:04:45 Commencing smooth kNN distance calibration using 1 thread
16:04:50 Initializing from normalized Laplacian + noise
16:05:20 Commencing optimization for 200 epochs, with 3839614 positive edges
16:07:27 Optimization finished
DimPlot(bmmc.combined, reduction = "umap")
###4.2 TSNE TSNE is calculated the same way.
bmmc.combined <- RunTSNE(bmmc.combined, dims = 1:20)
DimPlot(bmmc.combined, reduction = "tsne")
The object is stored for further processing
saveRDS(bmmc.combined, file = "./StoredRObj/bmmc_PreProc_Ref2.rds")
sessioninfo::package_info()
package * version date lib source
ape 5.3 2019-03-17 [2] CRAN (R 3.6.0)
assertthat 0.2.1 2019-03-21 [2] CRAN (R 3.6.0)
cli 2.0.2 2020-02-28 [2] CRAN (R 3.6.0)
cluster 2.1.0 2019-06-19 [2] CRAN (R 3.6.0)
codetools 0.2-16 2018-12-24 [3] CRAN (R 3.6.0)
colorspace 1.4-1 2019-03-18 [2] CRAN (R 3.6.0)
cowplot 1.0.0 2019-07-11 [2] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [2] CRAN (R 3.6.0)
data.table 1.12.8 2019-12-09 [2] CRAN (R 3.6.0)
digest 0.6.25 2020-02-23 [2] CRAN (R 3.6.0)
dplyr * 0.8.99.9003 2020-05-12 [2] Github (tidyverse/dplyr@75a29ef)
ellipsis 0.3.0 2019-09-20 [2] CRAN (R 3.6.0)
evaluate 0.14 2019-05-28 [2] CRAN (R 3.6.0)
fansi 0.4.1 2020-01-08 [2] CRAN (R 3.6.0)
farver 2.0.3 2020-01-16 [2] CRAN (R 3.6.0)
fitdistrplus 1.0-14 2019-01-23 [2] CRAN (R 3.6.0)
future 1.17.0 2020-04-18 [2] CRAN (R 3.6.0)
future.apply 1.5.0 2020-04-17 [2] CRAN (R 3.6.0)
generics 0.0.2 2018-11-29 [2] CRAN (R 3.6.0)
ggplot2 * 3.3.0 2020-03-05 [2] CRAN (R 3.6.0)
ggrepel 0.8.2 2020-03-08 [2] CRAN (R 3.6.0)
ggridges 0.5.2 2020-01-12 [2] CRAN (R 3.6.0)
globals 0.12.5 2019-12-07 [2] CRAN (R 3.6.0)
glue 1.4.0 2020-04-03 [2] CRAN (R 3.6.0)
gridExtra 2.3 2017-09-09 [2] CRAN (R 3.6.0)
gtable 0.3.0 2019-03-25 [2] CRAN (R 3.6.0)
htmltools 0.4.0 2019-10-04 [2] CRAN (R 3.6.0)
htmlwidgets 1.5.1 2019-10-08 [2] CRAN (R 3.6.0)
httr 1.4.1 2019-08-05 [2] CRAN (R 3.6.0)
ica 1.0-2 2018-05-24 [2] CRAN (R 3.6.0)
igraph 1.2.5 2020-03-19 [2] CRAN (R 3.6.0)
irlba 2.3.3 2019-02-05 [2] CRAN (R 3.6.0)
jsonlite 1.6.1 2020-02-02 [2] CRAN (R 3.6.0)
KernSmooth 2.23-15 2015-06-29 [3] CRAN (R 3.6.0)
knitr 1.28 2020-02-06 [2] CRAN (R 3.6.0)
labeling 0.3 2014-08-23 [2] CRAN (R 3.6.0)
lattice 0.20-38 2018-11-04 [3] CRAN (R 3.6.0)
lazyeval 0.2.2 2019-03-15 [2] CRAN (R 3.6.0)
leiden 0.3.3 2020-02-04 [2] CRAN (R 3.6.0)
lifecycle 0.2.0 2020-03-06 [2] CRAN (R 3.6.0)
listenv 0.8.0 2019-12-05 [2] CRAN (R 3.6.0)
lmtest 0.9-37 2019-04-30 [2] CRAN (R 3.6.0)
lsei 1.2-0.1 2020-05-06 [2] CRAN (R 3.6.0)
magrittr 1.5 2014-11-22 [2] CRAN (R 3.6.0)
MASS 7.3-51.6 2020-04-26 [2] CRAN (R 3.6.0)
Matrix 1.2-17 2019-03-22 [3] CRAN (R 3.6.0)
munsell 0.5.0 2018-06-12 [2] CRAN (R 3.6.0)
nlme 3.1-139 2019-04-09 [3] CRAN (R 3.6.0)
npsurv 0.4-0.1 2020-05-06 [2] CRAN (R 3.6.0)
patchwork 1.0.0 2019-12-01 [2] CRAN (R 3.6.0)
pbapply 1.4-2 2019-08-31 [2] CRAN (R 3.6.0)
pillar 1.4.4 2020-05-05 [2] CRAN (R 3.6.0)
pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 3.6.0)
plotly 4.9.2.1 2020-04-04 [2] CRAN (R 3.6.0)
plyr 1.8.6 2020-03-03 [2] CRAN (R 3.6.0)
png 0.1-7 2013-12-03 [2] CRAN (R 3.6.0)
purrr 0.3.4 2020-04-17 [2] CRAN (R 3.6.0)
R6 2.4.1 2019-11-12 [2] CRAN (R 3.6.0)
RANN 2.6.1 2019-01-08 [2] CRAN (R 3.6.0)
rappdirs 0.3.1 2016-03-28 [2] CRAN (R 3.6.0)
RColorBrewer 1.1-2 2014-12-07 [2] CRAN (R 3.6.0)
Rcpp 1.0.4.6 2020-04-09 [2] CRAN (R 3.6.0)
RcppAnnoy 0.0.16 2020-03-08 [2] CRAN (R 3.6.0)
reshape2 1.4.4 2020-04-09 [2] CRAN (R 3.6.0)
reticulate 1.15 2020-04-02 [2] CRAN (R 3.6.0)
rlang 0.4.6 2020-05-02 [2] CRAN (R 3.6.0)
rmarkdown 2.1 2020-01-20 [2] CRAN (R 3.6.0)
ROCR 1.0-11 2020-05-02 [2] CRAN (R 3.6.0)
RSpectra 0.16-0 2019-12-01 [2] CRAN (R 3.6.0)
rsvd 1.0.3 2020-02-17 [2] CRAN (R 3.6.0)
Rtsne 0.15 2018-11-10 [2] CRAN (R 3.6.0)
scales 1.1.1 2020-05-11 [2] CRAN (R 3.6.0)
sctransform 0.2.1 2019-12-17 [2] CRAN (R 3.6.0)
sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 3.6.0)
Seurat * 3.1.5 2020-04-16 [2] CRAN (R 3.6.0)
stringi 1.4.6 2020-02-17 [2] CRAN (R 3.6.0)
stringr 1.4.0 2019-02-10 [2] CRAN (R 3.6.0)
survival 3.1-12 2020-04-10 [2] CRAN (R 3.6.0)
tibble 3.0.1 2020-04-20 [2] CRAN (R 3.6.0)
tidyr 1.0.3 2020-05-07 [2] CRAN (R 3.6.0)
tidyselect 1.1.0 2020-05-11 [2] CRAN (R 3.6.0)
tsne 0.1-3 2016-07-15 [2] CRAN (R 3.6.0)
uwot 0.1.8 2020-03-16 [2] CRAN (R 3.6.0)
vctrs 0.3.0 2020-05-11 [2] CRAN (R 3.6.0)
viridisLite 0.3.0 2018-02-01 [2] CRAN (R 3.6.0)
withr 2.2.0 2020-04-20 [2] CRAN (R 3.6.0)
xfun 0.13 2020-04-13 [2] CRAN (R 3.6.0)
yaml 2.2.1 2020-02-01 [2] CRAN (R 3.6.0)
zoo 1.8-8 2020-05-02 [2] CRAN (R 3.6.0)
[1] /home/david.mentrup/R/x86_64-redhat-linux-gnu-library/3.4
[2] /home/common/R
[3] /usr/lib64/R/library
[4] /usr/share/R/library